Unit selection in concatenative TTS synthesis systems based on mel filter bank amplitudes and phonetic context

نویسندگان

  • Tanya Lambert
  • Andrew P. Breen
  • Barry Eggleton
  • Stephen J. Cox
  • Ben P. Milner
چکیده

In concatenative text-to-speech (TTS) synthesis systems unit selection aims to reduce the number of concatenation points in the synthesized speech and make concatenation joins as smooth as possible. This research considers synthesis of completely new utterances from non-uniform units, whereby the most appropriate units, according to acoustic and phonetic criteria, are selected from a myriad of similar speech database candidates. A Viterbi-style algorithm dynamically selects the most suitable database units from a large speech database by considering concatenation and target costs. Concatenation costs are derived from mel filter bank amplitudes, whereas target costs are considered in terms of the phonemic and phonetic properties of required units. Within subjects and between subjects ANOVA [9] evaluation of listeners’ scores showed that the TTS system with this method of unit selection was preferred in 52% of test sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACTOR: A multilingual unit-selection speech synthesis system

The ACTOR® Text-To-Speech (TTS) synthesis system, developed at Loquendo S.p.A., is here described. The system employs a unit -selection concatenative synthesis technique, relying on labeled acoustic databases providing phonetic and prosodic coverage of the intended language/domain and on an original algorithm for run-time selection of the acoustic units to be concatenated. This technique yields...

متن کامل

Advances in Spectral Parameterization for Statistical (HMM-Based) TTS

HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...

متن کامل

An embedded and concatenative approach to TTS of multiple languages

In thi and appro for E (Es.), efficie archit embe can b comm select text p the la are u letterspeec etc., a This paper presents an embedded and concatenative approach to multilingual text-to-speech system (ECMTTS). Under a uniform architecture, the TTS modules are separated into language dependent and independent ones. A specifically defined super phonetic symbol set enables to use uniform spee...

متن کامل

A Corpus-Based Concatenative Speech Synthesis System for Turkish

Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...

متن کامل

Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems

In this paper, a data pruning approach is presented for building acoustic unit inventory for syllable-based concatenative embeddable Chinese TTS system. A 3-portion segmentation of a syllable is proposed based on the nature of voiced/unvoiced structure of Chinese syllable. Individual factorial acoustic measurement of syllable is used to calculate the penalty of perceptual unsatisfactory for con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003